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Abstract 

The emergence of large scaled sensor networks facilitates the collec- 
tion of large amounts of real-time data to monitor and control complex 
engineering systems. However, in many cases the collected data may be in- 
complete or inconsistent, while the underlying environment may be time- 
varying or un- formulated. In this paper, we have developed an innova- 
tive cognitive fault diagnosis framework that tackles the above challenges. 
This framework investigates fault diagnosis in the model space instead 
of in the signal space. Learning in the model space is implemented by 
fitting a series of models using a series of signal segments selected with 
a rolling window. By investigating the learning techniques in the fitted 
model space, faulty models can be discriminated from healthy models us- 
ing one-class learning algorithm. The framework enables us to construct 
fault library when unknown faults occur, which can be regarded as cog- 
nitive fault isolation. This paper also theoretically investigates how to 
measure the pairwise distance between two models in the model space 
and incorporates the model distance into the learning algorithm in the 
model space. The results on three benchmark applications and one simu- 
lated model for the Barcelona water distribution network have confirmed 
the effectiveness of the proposed framework. 



1 Introduction 

The smooth operation of complex engineering systems is crucial to the modern 
society. To ensure reliability, safety and availability of such complex systems, 
large amounts of real-time data will be collected to detect and diagnose faults 
as soon as possible. Therefore designing an intelligent real-time system for 
fault diagnosis has been receiving considerable attention both from industry 
and academia. 

The fault diagnosis procedure can be investigated in the following three 
steps: (i) fault detection is to determine whether a fault has occurred or not; 



*The authors are with The Centre of Excellence for Research in Computational In- 
telligence and Applications (CERCIA), School of Computer Science, University of Birm- 
ingham, Birmingham B15 2TT, United Kingdom, email: {H.Chen, P.Tino, X.Yao, 
A. A. Rodan}@cs. bham.ac.uk. 



1 



(ii) fault isolation aims to determine the type/location of fault; and (iii) fault 
identification estimates the magnitude or severity of the fault. In some cases, 
the issues of fault isolation and fault identification are interwoven, since they 
both determine the type of fault that has occurred. 

In recent years, there has been a lot of research in the design and analysis 
of fault diagnosis schemes for different dynamic systems (for example, [UI2]). A 
significant part of the research has focused on linear dynamical systems, where 
it is possible to obtain rigorous theoretical results. More recently, consider- 
able effort has been devoted to the development of fault diagnosis schemes for 
nonlinear systems with various kinds of assumptions and fault scenarios [3j HJ E] • 

These traditional fault diagnosis approaches rely, to a large degree, on the 
mathematical model of the "normal" system. If such a mathematical model is 
available, then fault diagnosis is achieved by comparing actual observations with 
the prediction of the model. Most autonomous fault diagnosis algorithms are 
based on this methodology. However, for complex engineering systems operating 
in unformulated or time- varying environments, such mathematical models may 
not be accurate or even unavailable at all. Therefore, it is necessary to develop 
cognitive fault diagnosis methods mainly based on the collected real-time data. 

In this contribution we present a novel framework for dealing with fault 
detection to fault isolation if no, or very limited knowledge is provided about 
the underlying system. We do not assume that we know the type, the number 
or the functional form of the faults in advance. The core idea is to transform 
the signal into a higher dimensional "dynamical feature space" via reservoir 
computation models and then represent varying aspects of the signal through 
variation in the linear readout models trained in such dynamical feature spaces. 
In this way parts of the signal captured in a rolling window will be represented 
by the reservoir model with the readout mapping fitted in that window. 

Dynamic reservoirs of reservoir models have been shown to be 'generic' in the 
sense that they are able to represent a wide variety of dynamical features of the 
input driven signals, so that given a task at hand only the linear readout on top 
of reservoir needs to be retrained [6]. Hence in our formulation, the underlying 
dynamic reservoir will be the same throughout the signal - the differences in 
the signal characteristics at different times will be captured solely by the linear 
readout models and will be quantified in the function space of readout models. 

We assume that for some sufficiently long initial period the system is in a 
'normal/healthy' regime so that when a fault occurs the readout models charac- 
terizing the fault will be sufficiently 'distinct' from the normal ones. A variety 
of novelty /anomaly detection techniques can be used for the purposes of detec- 
tion of deviations from the 'normal'. In this contribution we will use one-class 
support vector machines (OCS) [7 methodology in the readout model space. As 
new faults occur in time they will be captured by our incremental fault library 
building algorithm operating in the readout model space. 

There have been other learning based approaches on fault detection and 
diagnosis, e.g. [HI [9j [lOl [IT]. For example, in [10], when neural network is ex- 
panded or the topology of the network is changed to accommodate new faults or 
unexpected dynamics, the network should be retrained [10]. Later on, Barakat 
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et al. proposed to use self adaptive growing neural network for faults diagnosis 
[12] . They applied wavelet decomposition and used the variance and Kurtosis 
of the decomposed signals as features. In 2009, Yelamos et. al [13] proposed to 
use support vector machines for fault diagnosis in chemical plants. Crucially, 
most of the current learning based approaches are formulated in the supervised 
learning framework, assuming that all fault patterns are known in advance. This 
can clearly be unrealistic. 

The contributions of this paper are as follows: a) we propose a novel learning 
framework for cognitive fault diagnosis; b) the framework is based on learning 
in the model space (as opposed to the traditional data space) of readout models 
operating on the dynamic reservoir feature space representing parts of signals; 
c) we propose to use incremental one class learning in the readout model space 
for fault detection/isolation and dynamic fault library building. 

The rest of this paper is organized as follows. Section [2] introduces determin- 
istic reservoir computing and the framework of "learning in the model space" , 
followed by the incremental one class learning algorithm for cognitive fault diag- 
nosis in Section [3] The experimental results and analysis are reported in Section 
[4] Finally, Section [5] concludes the paper and presents some future work. 

2 Deterministic Reservoir Computing and Learn- 
ing in the Model Space 

This section introduces deterministic reservoir model to fit multiple-input and 
multiple-output (MIMO) signals. Then, we introduce the framework of "learn- 
ing in the model space" for fault diagnosis. 

2.1 Deterministic Reservoir Computing 

Reservoir Computing (RC) [6 is a recent class of state space models based 
on a "fixed" randomly constructed state transition mapping, realized through 
so-called reservoir and an trainable (usually linear) readout mapping from the 
reservoir. Popular RC methods include Echo State Networks (ESNs) [2], Liquid 
State Machines [15] and the back-propagation decorrelation neural network [16] . 

In this paper, we will focus on Echo State Networks. ESNs are one of 
the simplest yet effective forms of RC. Generally speaking, ESNs are recurrent 
neural networks with a non-trainable sparse recurrent part (reservoir) and a 
simple linear readout. Typically, the reservoir connection weights as well as the 
input weights are randomly generated, subjected to the "Echo State Property" 

The traditional randomized RC is largely driven by a series of randomized 
model building stages, which could be unstable and hard to understand, espe- 
cially for fault diagnosis. In this paper, we propose to use the deterministic 
reservoir algorithm, i.e. simple cycle topology with regular jumps (CRJ) [TT] , 
to fit the signals for fault diagnosis, since CRJ can approach any non-linear 
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Figure 1: Illustration of "learning in the model space" framework. The first 
stage is to fit models using the input-output signal, i.e. generate individual 
points in the model space. The second stage is to discriminate the faulty models 
from healthy models using discriminating learners. 



mapping with arbitrary accuracy. Due to the linear training, the CRJ model 
can be trained fast and run in real-time. 

2.2 Learning in the Model Space 

Recently, there is a new trend in the machine learning community to represent 
'local' data collections through models that capture what we think is important 
in the data and do machine learning on those models - this can have benefit of 
more robust and more targeted learning on diverse data collections [18] . 

The idea of learning in the model space is to use models fitted on parts of 
data as more stable and parsimonious representations of the data. Learning is 
then performed directly in the model space, instead of the original data space. 
Some aspects of the idea of learning in the model space have occurred in different 
forms in the machine learning community. For example, using generative kernels 
for classification (e.g. P-kernel [19] or Fisher kernel [20]) can be viewed as a 
form of learning in a model-induced feature space (see e.g. [2TJ [22] ) . Recently, 
Brodersen et al. [18] used a generative model of brain imaging data to represent 
fMRI measurements of different subjects to build a SVM-type learner to classify 
these subjects into aphasic patients or healthy controls. 

In this paper, we use "learning in the model space" approach to represent 
chunks of signals by dynamic models (reservoirs models with linear readout) and 
perform learning in the models space of readouts. The framework is illustrated 
in Figure [TJ 
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2.2.1 Distance in the Model Space 



There are several ways to generate the model space from the original signal 
space. One possible way is to identify parameterized models with their param- 
eter vectors and work in the parameter space. This, however, will make the 
learning highly dependent on the particular model parameterization used. A 
more satisfying approach is to use parameterization-free notions of distance or 
similarities between the models. 



In the model space, the m-norm distance between models /i(x) and /2(x) 
h ' ^ N -> »°) is defined as follows: 



where D m (/i(x), /2(x)) = ||/i(x) — /2(x)|| m is a function to measure the dif- 
ference between /i(x) and / 2 (x), fJ>(x) is the probability density function of the 
input domain x, and C is the integral range. In this paper, we adopt m = 2 
and first assume that x is uniformly distributed. Of course, non-uniform /i(x) 
can be adopted either by using samples generated from it or by estimating it 
directly using e.g. Gaussian mixture models. 

In the following, we demonstrate the application of the distance definition 
in the model space for linear readout models. The readout model can be repre- 
sented by the following equation 



where x = [xi, • • • , xn] t is a state vector or basis function, TV is the number of 
input variables in the model, W is the parameters (O x N matrix) in the model, 
O is the output dimensionality, and a = [ai, • • • , a Q ] G 3?° is the bias vector of 
output nodes. 

Consider two readouts from the same reservoir 



Since the sigmoid activation function is employed in the domain of the readout, 




/(x) = Wx + a, 



/i(x) = Wix + ai, 
/ 2 (x) = W 2 x + a 2 . 



CG [-1,1] 



. Then, 
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(I -W 2 )x + (ax -a 2 )|| 2 dx 
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where W = W\ — W2 , and a = ai a 2 . 



5 



Note that for any fixed a and W 

r 



[ a^x dx = 0, 
Jc 



in the integral range C. 
Therefore, 



L 2 (fuh) = (£||Wx|| 2 + ||a|| 2 dx) 
f o 



1/2 



2 + lla|| 2 dx 



\ 1/2 

j=l i=l J 

where wf is the i-th row of W, Wij is the (i, j)-th element of W. 

Scaling of the squared model distance (LK/i,^)) by 2 _7V we obtain 



j = l i=l 

which differs from the squared Euclidean distance of the readout parameters 

N O 

EE<- + h 2 . 

j=l i=l 

by the factor 1/3 applied to the differences in the linear part W of the affine 
readouts. Hence, more importance is given to the 'offset' than 'orientation' of 
the readout mapping. 

In the above, we assumed that the distribution of x is uniform in the integral 
range C. As mentioned before, in case of non-uniform /i(x), we can either 
use samples generate from ji or estimate it analytically using e.g. a Gaussian 
mixture model. 

Assume we have m sampled points x^, i = 1, 2, m from fi. Then 

i 2 (/i,/ 2 ) 

1/2 



C 



||/ 1 (x)-/ 2 (x)|| 2 d /i (x) 



(-Eii/iW-^wii 2 ) • 



1/2 

(2) 
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Alternatively, Gaussian mixture model can be employed to represent /i, 

K 



/i(x) = y^ai/ii(x|r/-,Ei),and 

z=l 

exp (-|(x - ^) T Il7 1 (x - rji)) 



where YhLi a i = 1 an d ^ * s the dimensionality of x. 

Then, the distance ^(/l,/^) can be obtained as follows: 

i 2 (/i,/ 2 ) 



c 



1/2 

(/!(x)-/ 2 (x)) 2 dMx)) , (3) 



A / trace(W T WY,i) + rjf W T Wr] i \ 
l^ ai \ +2a T W^ + a T a J ' 

2=1 V ' 



3 Incremental One Class Learning for Cognitive 
Fault Diagnosis 

In fault diagnosis, it should be determined whether a running sub-system/component 
is in a normal operation condition, or whether a faulty situation is occurring. 
It is relatively cheap and simple to obtain measurements from a normally work- 
ing system (although sampling from all possible normal situations might still 
be expensive). In contrast, sampling from faulty situations requires the system 
to break down in various ways to obtain faulty measurement examples. The 
construction of a fault library will therefore be very expensive, or completely 
impractical. In this section, we focus on this challenge and aim to develop 
an algorithm that can identify unknown faults and construct a fault library 
dynamically, which will facilitate fault isolation based on this library. 

Based on the "learning in the model space" framework (Figure [T]), one class 
learning [7] will be employed in the model space for fault diagnosis. One-class 
classification is a special type of classification algorithm. One-class SVMs are 
to discover a hyperplane that has maximal distance to the origin in the kernel 
feature space with the given training examples falling beyond the hyperplane 

Note that the signal characteristics can change at different positions of the 
rolling window. That means that the underlying measure \i over reservoir acti- 
vations x can change. Consider two readouts fi and fj obtained from two rolling 
window positions i and j. If reservoir activations in positions i and j are con- 
sidered we would obtain two distances L fJL .(fi, fj) and L flj (/^, /j), respectively. 



1 The measures wm be represented by reservoir activation samples at window position 

k. 
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Algorithm 1 Incremental One Class Learning for Cognitive Fault Detection 
l: Input: multiple-input and multiple-output data stream Si, • • • , s t , s £+ i • • • , 
where s t = • • • , i£y, yi, • • • , 2/o) T > V' is the number of signal inputs and 
O is the number of outputs. The data segment si, • • • , s t are normal states 
of the system; parameters (cr and v) of one-class SVMs; window size ra. 
2: Output: model library lib. 

3: for each sliding window (s^, • • • , s^ +m _i), 1 < i < t + 1 - m do 
4: Fit deterministic reservoir computing model. 
5: drc(si, • • • ,s i+m _i) 
6: end for 

7: Calculate the pairwise model distance matrix L^/i, /</), 1 < < t + 1 — m 

according to Equation (pQ) 
8: Apply one class SVMs: OC6 , (L 2 , cr, z/) — 6o and add 9o in the model 

library lib = {Oo}. 
9: for sliding window (sj, • • • , s J+m _i), j > t do 
10: drc(s jr • • ,s J+m _i) /j; 

11: if /j belongs to a known fault in the lib then 
12: update O/c with /j and empty candidate pool; 
13: else 

14: put fj in the candidate pool; 
15: end if 

16: if size of candidate pool > 0.5 * m then 

17: build a new model 0fc+i with candidate pool 

18: Add Ofe+i to lib and empty candidate pool 

19: end if 

20: end for 



The distance fj based on the sampling approach is then 

L 2 (f i ,f j )=L llt (f i ,f j ) + L H (f i ,f j ). 

In this paper, we propose an algorithm that can construct the fault library 
online. The idea is to use each one-class learner to represent each fault/sub- fault 
segment by using the "learning in the model space" approach. In the beginning, 
a normal one-class learner Oo will be constructed based on the normal signal 
segments. With the rolling window moving forward, we continually apply Oo 
to judge whether a fault occurs. If a fault is coming, we will train a new one- 
class- learner Qi for fault i. Then, we keep monitoring the signal and determine 
whether the ongoing signal segment belongs to either normal state or a known 
fault. If not, a new one-class learner will be built and included in the model 
library. The algorithm is illustrated in Algorithm [TJ which includes the following 
major steps: 

1. Normal data preparation by applying deterministic reservoir model drc to 
the rolling windows (size m) in the first t steps, i.e. the "normal" regime 
is sequentially induced. (Lines 3-6) 
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2. Calculate the pairwise model distance matrix Li2(/i,/j) and employ one 
class SVMs (OCS) to obtain the normal class ©o- (Lines 7-8) 

In one class SVMs, Gaussian RBF kernel is employed with the data dis- 
tance replaced by the model distance L^fi, fj)\ 

<M/*> fj) = ex P {- a ' L z(fii fj)} ■ 

3. With the rolling window moving forward, if a new fj belongs to an existing 
model 6/J1, update the existing &k with this new data fj and empty 
candidate pool. Otherwise, put the "point" fj in the candidate pool. 
(Lines 9-15) 

4. If the number of data points in the candidate pool exceeds half of the 
window size m, construct a new one-class learner O/c+i and empty the 
candidate pool. (Lines 16-18) 

In the above algorithm, the assumption is that the system is running nor- 
mally in the first t steps. Although the window size m should be relatively large 
(e.g. > 300 time steps) to accurately fit the dynamic models (e.g. deterministic 
reservoir computing in this paper). The rolling window is moved forward by 
one time step, which reduces fault detection delays. 

4 Experimental Studies 

This section presents experimental results in four- "fault" -diagnosis scenarios, 
which include one synthetic nonlinear auto-regressive moving average (NARMA) 
system with three different signals, one van der Pol oscillator with three faults 
imposed, one benchmark three-tank-system with three faults and Barcelona 
water system with 31 faults. This paper will investigate fault detect ability and 
fault isolationability using a number of approaches. 

4.1 Experimental Settings 

In our experiments, to evaluate the "learning in the model space" framework 
for fault diagnosis, a number of approaches have been adopted for compar- 
isons. The approaches include: Hotelling's T-squared statistic test (T2) [23] , 
a density-based algorithm for discovering clusters in large spatial databases 
with noise (DBscan) [24], affinity propagation [25 in the model space (AP- 
Model), affinity propagation in the signal space (AP-Signal), one class SVMs 
[7] in the model space (OCS-Model), one class SVMs in the signal space (OCS- 
Signal), autoregressive-moving- average model with exogenous inputs with incre- 
mental one-class leaner (ARMAX-OCS), reservoir computing with incremental 
one-class leaner (RC-OCS), deterministic reservoir computing with incremental 

2 If the new point fj is classified to more than one model by one-class SVMs, count the 
point in the last model because of sequential correlation. 
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Table 1: Algorithms and Parameters 



Algorithm 


Space 


Parameters 


T2 


signal 




DBscan 


model 


k number of neighborhood 
e neighborhood radius 


AP-Model 


model 




AP-Signal 


signal 




OCS-Model 


model 


cr Gaussian kernel parameter 
v the upper bound of outliers 


OCS-Signal 


signal 


cr Gaussian kernel parameter 
v the upper bound of outliers 


ARMAX-OCS 


model 


cr Gaussian kernel parameter 

v the upper bound of outliers 

m number of nodes in reservoir (25) 

p p autoregressive terms 

q moving average terms 

b exogenous inputs terms 


RC-OCS 


model 


cr Gaussian kernel parameter 
v the upper bound of outliers 
m number of nodes in reservoir (25) 


DRC-OCS (sampling) 


model 


cr Gaussian kernel parameter 
v the upper bound of outliers 
m number of nodes in reservoir (25) 


DRC-OCS 


model 


cr Gaussian kernel parameter 
v the upper bound of outliers 
m number of nodes in reservoir (25) 



one-class leaner (DRC-OCS) and DRC-OCS (sampling) where the model dis- 
tance matrix is estimated by sampling method (Equations (j2]and ©))■ Table 
[U summaries all the algorithms employed in this paper. 

The signal space is generated by selecting p consecutive points, i.e. {st, • • • , s t + p . 
where s t = (i/i, • • • , uy, yi, • • • , 2/o) T , as a training point by re- arranging these 
p points to one vector. The order p will be selected in the range [1,30]. 

In the following four data sets, we generate 3000 time steps for normal signal 
and each fault signal, respectively, and employ a rolling window (size 500) to 
generate a series of data segments, which are employed to train deterministic 
reservoir model. In each data set, the first 1000 time steps of the signal are 
normal, i.e. the first 500 models are normal with window size 500. 

The parameters of DBscan are optimized by minimizing the number of dis- 
covered classes and the false alarm rates using the first 500 normal points. The 
parameters of ARMAX are selected by minimizing the normalized mean squared 
error (NMSE) in the first 1000 time steps. The parameters of one class SVMs 
in OCS-Model, OCS-Signal, ARMAX-OCS, RC-OCS and DRC-OCS will be 
optimized by 5-fold cross validation using the first 500 data points. 

4.2 NARMA System 

In NARMA, the current output depends on both the input and the previous 
output. Generally speaking, it is difficult to model this system due to high non- 
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Figure 3: Visualization of the NARMA data set in the model space (top) and 
signal space (p = 30) (bottom) by multi-dimensional scaling (MDS). 



11 



linearity and possibly long memory. In this paper, we employed three NARMA 
time series with orders O = 10,20,30 that are given by Equations (j4]), (j5j) and 
(|6]), respectively. 
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2/(* + l) 



0.32/(t) + 0.052/(t)^2/(t-z) 



+l.hu{t- 9)u(t) + 0.1, 



(4) 



19 



2/(* + l) 



tanh(0.32/(t) + 0.05y(t) ^ - z) 



+1.5u(t - 19)u(t) + 0.01) + 0.2, 



(5) 



29 



y(t + l) = 0.2y(t) + 0.004y(t)5^y(t-») 



+1.5u(t-29)u(t) + 0.201, 



(6) 



where ?/(£) is the system output at time t, ii(t) is the system input at time t 
(u(t) is an i.i.d stream generated uniformly in the interval [0,0.5). 

The three sequences are illustrated in Figure [2j The three NARMA se- 
quences look quite similar, and it is very difficult to separate them based on the 
signal only. 

Figure [3] shows MDS analyst of the NARMA data set in the model space 
(top) and in the signal space (bottom). Based on this figure, it is relatively 
easier to separate different classes in the model space, while most of the data 
points overlap in the signal space. The figure confirms that the model based 
representation is able to effectively represent the signals. In Table [3l several 
supervised classification techniques have been employed to confirm the benefits 
of using model space based approaches. 

4.3 Van der Pol Oscillator 

A Van der Pol oscillator [26] has been a subject of extensive research and its 
discrete-time expressions play an important role in the numerical investigations. 
Discrete-time Van der Pol oscillator can be obtained as follows 



where e is Gaussian white noise with variance 0.01. 

Three faults are imposed to the van der Pol oscillator by adding 0.75 sin(^i (fc— 
I)) At, 0.75tanh(2/i(fc - I)) At and 0.75 cos (y^k - l) 2 ) to y 2 (k). The van der 
Pol oscillator and the three faults are illustrated in Figure HJ 

3 Multidimensional scaling (MDS) aims to preserve the pairwise distance between points, 
which is suitable to preserve the model distance for visualization. 



l/i (fc) = y 2 At + yi (k- 1), 

V2 (k) = y 2 (k-l) + y 2 (k-l)(l- yi (k-l) 2 )At 
- yi (k-l)At + e, 
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Figure 4: Illustration of Van der Pol oscillator and three different faults, (top: 
yi (k), bottom: 2/2 (*0) 



13 



4.4 Three Tank System 
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Figure 5: Three tank system [3]. 
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Figure 6: Illustration of levels in three tanks in the three tank system and three 
different faults, (left: tank 1, middle: tank 2, right: tank 3) 

A well-known three-tank problem [3 in Figure [5] is presented to illustrate 
the effectiveness of the proposed algorithm. The cross-section of these tanks is 
Ai = lm 2 , and there is a cross-section A p = 0.1m 2 at the end of each tank. 
The outflow rate is Cj, z, j = 1, • • • ,3. The level of each tank is denoted by xi 
(0 < Xi < 10, i = 1,-- • ,3). 

The input flows by two pumps are denoted by Ui with the restrictions 
< Ui < lm 3 /s, i = 1,2. In this paper, the inflows are set with u\(k) = 
0.2cos(0.3/cT s ) + 0.3 and u 2 (k) = 0.25 cos(0.5^T s ) + 0.3, respectively, and the 
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initial levels of thanks are 8, 6.5, and 5 meter. In the model, three faults are 
introduced as follows: 

1) Actuator fault in pump 1: the pump is partially or fully shutdown. 

2) Leakage in tank 3: there is a leak circular hole with unknown radius 
< p3 < 1 m the tank bottom. 

3) Actuator fault in pump 2: the fault is same as fault 1 but related to 
pump number 2. 

Figure [6] illustrates the water levels of three tanks in normal and three faulty 
situations. 

4.5 Barcelona Water Distribution Network 




i 



Figure 7: Barcelona Water System Simulator Programmed by MATLAB 
Simulink [27]. 

The next application is Barcelona Water Distribution Network (BWDN) 
[27] . BWDN supplies water to approximately 3 million consumers, distributed 
in 23 municipalities in a 424 km 2 area. Water can be taken from both surface 
and underground sources. From these sources, water is supplied to 218 demand 
sectors through about 4645 km of pipe. The complete transport network has 
been modeled using 63 storage tanks, 3 surface and 6 underground sources, 79 
pumps, 50 valves, 18 nodes and 88 demands. 

A detailed simulation model of the BWDN has been developed using MAT- 
LAB/Simulink [27] (Figure [7]), which has been calibrated and validated using 
real data. In this simulator, we can manipulate and inject different faults into 
the system. Studied faults are introduced in the two subsystems of the network 
shown in Figure [8] In the two subsystems, we introduced 31 faults, which are 
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Figure 8: Subsystems of the water network where faults are introduced. iOrioles, 
iStaClmCervello, iCesalpinal, iCesalpina2 are actuators (controller). cl75LOR, 
C147SCC, C205CES, c263CES are demand (input). dl75LOR, dl47SCC, 
d205CES, d263CES are tank level (output). 

detailed in Table O These faults include actuator faults, actuator sensor faults, 
demand (input) sensor faults, and tanks (output) sensor faults. Four examples 
of faulty signals are illustrated in Figure [9] 

As there are two subsystems, two deterministic reservoir computing mod- 
els, each with 25 nodes in the reservoir, have been employed in the proposed 
framework. 

4.6 Comparisons and Evaluations 

This section will first report the comparisons of several supervised algorithms 
applied in the model space and signal space, respectively, and then evaluate 
those algorithms listed in Table [T] in terms of fault detectability and fault isola- 
tionability. 

In above section, the model space and signal space have been illustrated by 
the MDS algorithm. However, due to the high dimensionality, the visualizations 
might not reveal the real relationship of these data points in the high dimen- 
sional space. In order to compare the model space and signal space based ap- 
proaches, Table [3] reports the comparisons of the representations of model space 
and signal space using a number of supervised learning algorithms, including 
classification and regression trees (CART), support vector machines (SVMs), 
one class support vector machine (OCS), Bagging (100 trees) and Adaboosting 
(100 trees). 

In the signal space approach, the order p will be selected in the range [1, 30] 
by 5-fold cross validation approach. The parameters of SVMs and one-class 



16 



Table 2: Parameterizations of faults. MFD stands for maximum flow/demand. 



ID 


Faulty Element 


Type 


Magnitude 


ID 


Faulty Element 


Type 


Magnitude 


1 


iOrioles 


1 


-25% 


17 


iStaClmCervello 


3 


0.01% 


2 


iOrioles 


2 


-25% 


18 


iStaClmCervello 


4 


0.5% 


3 


iOrioles 


2 


-10% 


19 


iStaClmCervello 


5 




4 


iOrioles 


3 


0.001% 


20 


iStaClmCervello 


6 


4 


5 


iOrioles 


3 


0.1% 


21 


iCesalpinal 


1 


10% 


6 


iOrioles 


4 


10% 


22 


iCesalpinal 


2 


-15% 


7 


iOrioles 


4 


1% 


23 


iCesalpinal 


3 


0.01% 


8 


iOrioles 


5 




24 


iCesalpinal 


4 


0.75% 


9 


iOrioles 


6 


2 


25 


iCesalpinal 


5 




10 


cl75LOR 


1 


-20% 


26 


iCesalpinal 


6 


0.75 


11 


cl75LOR 


2 


-15% 


27 


c263CES 


1 


30% 


12 


cl75LOR 


3 


0.01% 


28 


c263CES 


2 


-15% 


13 


cl75LOR 


4 


1% 


29 


c263CES 


3 


0.025% 


14 


cl75LOR 


5 




30 


c263CES 


4 


0.5% 


15 


iStaClmCervello 


1 


-15% 


31 


c263CES 


5 




16 


iStaClmCervello 


2 


-7.5% 










Type 


Details & Parameter 


Type 


Details & Parameter 


1 


Additive offset (%MFD) 


4 


Additive drift (%MFD) 


2 


Additive incipient offset (%MFD) 


5 


Abrupt freezing (-) 


3 


Noise (variance %MFD) 


6 


Multiplicative offset (divided by) 
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Table 3: Comparisons of model space based approach and signal based approach 
using supervised learning techniques. The reported results are based on 10 runs 
of 5-fold cross validation. 



Algorithm 


NARMA 


Van der Pol 


Three Tank 


Water 




Model 


Signal 


Model 


Signal 


Model 


Signal 


Model 


Signal 


CART 


0.00(0.00) 


0.33(0.01) 


0.07(0.01) 


0.11(0.01) 


0.01(0.00) 


0.02(0.00) 


0.06(0.01) 


0.11(0.00) 


SVMs 


0.00(0.00) 


0.07(0.01) 


0.05(0.01) 


0.07(0.01) 


0.00(0.00) 


0.00(0.00) 


0.06(0.00) 


0.14(0.00) 


OCS 


0.04(0.01) 


0.32(0.01) 


0.15(0.01) 


0.27(0.01) 


0.02(0.01) 


0.10(0.01) 


0.09(0.01) 


0.23(0.00) 


Bagging 


0.00(0.00) 


0.24(0.01) 


0.01(0.00) 


0.07(0.00) 


0.00(0.00) 


0.01(0.01) 


0.04(0.01) 


0.08(0.01) 


Boosting 


0.00(0.00) 


0.33(0.01) 


0.15(0.01) 


0.22(0.01) 


0.01(0.00) 


0.04(0.00) 


0.07(0.01) 


0.16(0.00) 



Table 4: Comparisons of several algorithms in terms of fault detection ability, 



i.e. fault detection rate (FDR) and false alarm rate (FAR). 





NARMA 


Van der Pol 


Three Tank 


Barcelona Water 


Algorithm 


FDR 


FAR 


FDR 


FAR 


FDR 


FAR 


FDR 


FAR 


T2 


0.9072 


0.1000 


0.3009 


0.0998 


0.2311 


0.0999 


0.2316 


0.1384 


DBscan 


1 


0.0917 


0.9146 


0.2317 


0.8958 


0.0683 


0.7981 


0.1368 


OCS-Model 


1 


0.1102 


0.9310 


0.0509 


0.8521 


0.1082 


0.9313 


0.2683 


OCS-Signal 


0.7042 


0.2097 


0.7686 


0.2104 


0.7521 


0.2082 


0.4920 


0.3796 


AP-Model 


1 





1.0000 


0.3405 


0.8407 


0.1128 


0.9014 


0.2678 


AP-Signal 


1 


0.5427 


1.0000 


0.7405 


0.7155 


0.2387 


0.8879 


0.2458 


ARMAX-OCS 


0.9882 


0.0517 


0.8727 





0.9776 





0.7369 


0.1588 


RC-OCS 


0.9747 


0.0558 


0.9762 


0.0158 


0.8387 





0.8271 


0.1079 


DRC-OCS(Sampling) 


0.9789 





0.9804 





0.9926 





0.9327 


0.0817 


DRC-OCS 


0.9921 





0.9818 





0.9919 





0.9762 


0.0473 



Table 5: Comparisons of several algorithms in terms of fault isolation ability. 





NARMA (3 classes) 


Van der Pol (4 classes) 


Algorithm 


Classes 


Precision 


Recall 


Specificity 


Classes 


Precision 


Recall 


Specificity 


DBscan 


4 


0.6690 


0.7650 


0.8825 


10 


0.7629 


0.6842 


0.8018 


AP-Model 


271 


0.9699 


0.9698 


0.9899 


367 


0.8778 


0.8757 


0.9585 


ARMAX-OCS 


5 


0.9354 


0.9229 


0.9615 


2 


0.4309 


0.4880 


0.7868 


RC-OCS 


3 


0.9637 


0.9615 


0.9808 


6 


0.9606 


0.9583 


0.9861 


DRC-OCS(Sampling) 


3 


0.9683 


0.9692 


0.9914 


5 


0.9617 


0.9726 


0.9819 


DRC-OCS 


3 


0.9861 


0.9858 


0.9929 


5 


0.9736 


0.9731 


0.9910 




Three Tank (4 classes) 


Barcelona Water (32 classes) 


Algorithm 


Classes 


Precision 


Recall 


Specificity 


Classes 


Precision 


Recall 


Specificity 


DBscan 


14 


0.8742 


0.7561 


0.9253 


61 


0.8019 


0.7326 


0.8654 


AP-Model 


272 


0.9713 


0.9704 


0.9901 


654 


0.9366 


0.9428 


0.9751 


ARMAX-OCS 


5 


0.9914 


0.9923 


0.9984 


57 


0.7826 


0.7419 


0.8237 


RC-OCS 


9 


0.9182 


0.8788 


0.9596 


44 


0.8913 


0.8942 


0.9263 


DRC-OCS(Sampling) 


7 


0.9940 


0.9949 


0.9988 


39 


0.9219 


0.9310 


0.9513 


DRC-OCS 


10 


0.9931 


0.9931 


0.9977 


48 


0.9538 


0.9640 


0.9871 
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SVMs are optimized by 5-fold cross validation. The parameters in CART, Bag- 
ging and Adaboosting follow the defaults in MATLAB. 

The reported results in Table[3]are based on 10 runs of 5-fold cross validation. 
In Table [3j model space representation usually achieves lower error rate. In some 
cases, e.g. CART/SVMs in NARMA and SVM/Bagging in three tank system, 
model space representation can even achieve 100% accuracy. These results are 
consistent with those MDS visualizations, and confirm the benefits to use model 
space rather than signal space in fault diagnosis. 

In fault diagnosis, the first step is to discriminate faults from normal situa- 
tions. Table H] reports fault detection results using a number of algorithms listed 
in Table [TJ The parameters related to DBscan, one-class SVM and ARM AX are 
optimized by 5-fold cross validation in the normal period. In this table, fault 
detection rate (FDR) and false alarm rate (FAR) are employed as two metrics. 

According to Table 3J model space based algorithms, such as DRC-OCS, 
RC-OCS, are superior to other algorithms. Since deterministic reservoir is more 
stable than random reservoir and there is no model assumption in DRC0, DRC- 
OCS is better than RC-OCS and ARMAX-OCS. 

Although the sampling method of DRC-OCS could potentially obtain bet- 
ter estimates when the readout parameters are non-uniform, it would require 
dense sampling points, i.e. large window size m in this case, with increased 
computational cost. However, due to real-time requirements and computational 
restrictions, the windows size should be restricted for prompt response to faults. 
Hence, DRC-OCS (sampling) is often inferior to DRC-OCS. 

The statistical-test based algorithm T2 acts are a base line algorithm and it 
usually has a lower FDR and a fair FAR. DBscan and affinity propagation (AP) 
are clustering based algorithms. As these clustering algorithms do not make use 
of the information that the first t steps are normal, these algorithms did not 
perform well in the four applications. 

In time- varying environment, there may be unanticipated fault scenarios that 
haven't been encountered before. In this paper, we proposed a dynamic fault 
library construction framework and its application on fault isolation. These 
results are reported in Table [5] 

In Table [5j we first report the true number of classes and the discovered 
classes (i.e. number of faults plus normal class) using a number of algorithms 
for each data set@. Then, we report the fault isolation performance of these 
algorithms in terms of precision, recall and specificity. 

Since the number of discovered faults does not equal to the true number of 
faults, we compare each true cluster A^ and these discovered clusters and merge 
those clusters with maximizing overlap with A^ to a pseudo-cluster A$. The 
performance metrics are obtained by comparing A and A. 

Based on Table [5j DRC-OCS usually outperforms other algorithms under 

4 ARMAX model assumes the model order and ARMAX-OCS might not perform well on 
signals with incorrect model assumption. 

5 Due to the assumption that the type of faults are unknown in advance, these compared 
algorithms always discover more faults than true number of faults by decomposing each true 
fault to a number of small fault segments. 
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these three metrics. AP-model performs well on the isolation stage, but it often 
generates too many sub-faults in the library, e.g. 270 sub-faults verse 2 faults. 

In the three "learning in the model space" approaches, i.e. DRC-OCS, RC- 
OCS and ARMAX-OCS, DRC-OCS is the best and ARMAX-OCS is the most 
inferior one as it requires the model order selection for different applications. 
Without prior information for complex applications, it is usually difficult to 
select the model order. With limited sampling points due to real-time require- 
ment, the sampling method of DRC-OCS is often inferior to DRC-OCS, though 
it often outperforms other approaches. 

Based on the results presented in Table [3j |4] and [5j the proposed approach 
DRC-OCS achieves the best results and these results also confirmed that "learn- 
ing in the model space" is an effective framework for fault diagnosis. 

5 Conclusion 

In this paper, an effective cognitive fault diagnosis framework has been proposed 
to tackle the challenges in complex engineering systems in time- varying or un- 
formulated environment. Instead of investigating the fault diagnosis in the 
signal space, this paper introduces "learning in the model space" framework 
that represents the multiple-input and multiple-output data as a series of models 
fitted using a rolling window. By investigating the characteristic of these fitted 
models using learning approach in the model space, we can identify and isolate 
faults effectively, and dynamically construct a fault library. 

This contribution applies deterministic reservoir models to fit the MIMO 
data, since reservoir models are generic to fit a wide variety of dynamical fea- 
tures of the input driven signals, and the deterministic reservoir models further 
simplify the model structure and thus improve the fitting performance. 

To rigorously investigate these fitted models for fault diagnosis, this paper 
demonstrates the application of the distance definition in the model space for 
linear readout models. The model distance differs from the squared Euclidean 
distance of the readout parameters, indicating that more importance is given to 
the 'offset' than 'orientation' of the readout mapping. We also present the esti- 
mated forms of model distance by using either sampling methods or a Gaussian 
mixture model when the domain of readout-parameters is non-uniform. 

By replacing the data distance matrix with the model distance matrix, one- 
class SVMs are able to "learn" in the model space to identify normal/abnormal 
models. To accommodate unknown faults, the algorithm "incremental one class 
learning in the model space" is proposed to identify and isolate faults, and 
simultaneously construct the fault library. 

To evaluate this proposed framework with other related fault diagnosis ap- 
proaches, three benchmark systems and one simulated model for Barcelona wa- 
ter system have been employed. The results confirm both the benefits to repre- 
sent MIMO data in the model space and the effectiveness of "learning in model 
space" framework. 

"Learning in the model space" is an effective framework for complex data 
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representation and fault diagnosis. Instead of using reservoir models and one 
class SVMs as fitting and discriminating models, respectively, there should be 
other effective opinions or combinations for various application systems, which 
consist of our future work. 
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